MaghrebDataLibMedV1, Main, Exploration, bibRecord, 000224

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity.

Identifieur interne : 000224 ( Main/Exploration ); précédent : 000223; suivant : 000225

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity.

Auteurs : Juan J. Lastra-Díaz [Espagne] ; Josu Goikoetxea [Espagne] ; Mohamed Ali Hadj Taieb [Tunisie] ; Ana García-Serrano [Espagne] ; Mohamed Ben Aouicha [Tunisie] ; Eneko Agirre [Espagne]

Source :

Data in brief [ 2352-3409 ] ; 2019.

RBID : pubmed:31516953

Abstract

This data article introduces a reproducibility dataset with the aim of allowing the exact replication of all experiments, results and data tables introduced in our companion paper (Lastra-Díaz et al., 2019), which introduces the largest experimental survey on ontology-based semantic similarity methods and Word Embeddings (WE) for word similarity reported in the literature. The implementation of all our experiments, as well as the gathering of all raw data derived from them, was based on the software implementation and evaluation of all methods in HESML library (Lastra-Díaz et al., 2017), and their subsequent recording with Reprozip (Chirigati et al., 2016). Raw data is made up by a collection of data files gathering the raw word-similarity values returned by each method for each word pair evaluated in any benchmark. Raw data files were processed by running a R-language script with the aim of computing all evaluation metrics reported in (Lastra-Díaz et al., 2019), such as Pearson and Spearman correlation, harmonic score and statistical significance p-values, as well as to generate automatically all data tables shown in our companion paper. Our dataset provides all input data files, resources and complementary software tools to reproduce from scratch all our experimental data, statistical analysis and reported data. Finally, our reproducibility dataset provides a self-contained experimentation platform which allows to run new word similarity benchmarks by setting up new experiments including other unconsidered methods or word similarity benchmarks.

DOI: 10.1016/j.dib.2019.104432
PubMed: 31516953
PubMed Central: PMC6736772

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Main, to step Corpus: 000216
to stream Main, to step Curation: 000216

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity.</title>
<author><name sortKey="Lastra Diaz, Juan J" sort="Lastra Diaz, Juan J" uniqKey="Lastra Diaz J" first="Juan J" last="Lastra-Díaz">Juan J. Lastra-Díaz</name>
<affiliation wicri:level="3"><nlm:affiliation>NLP & IR Research Group, ETSI de Informática (UNED), Universidad Nacional de Educación a Distancia, Juan Del Rosal 16, 28040, Madrid, Spain.</nlm:affiliation>
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>NLP & IR Research Group, ETSI de Informática (UNED), Universidad Nacional de Educación a Distancia, Juan Del Rosal 16, 28040, Madrid</wicri:regionArea>
<placeName><settlement type="city">Madrid</settlement>
<region nuts="2" type="region">Communauté de Madrid</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Goikoetxea, Josu" sort="Goikoetxea, Josu" uniqKey="Goikoetxea J" first="Josu" last="Goikoetxea">Josu Goikoetxea</name>
<affiliation wicri:level="1"><nlm:affiliation>IXA NLP Group, Faculty of Informatics, UPV/EHU∖∖ Manuel Lardizabal 1, 20018, Donostia, Basque Country, Spain.</nlm:affiliation>
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>IXA NLP Group, Faculty of Informatics, UPV/EHU∖∖ Manuel Lardizabal 1, 20018, Donostia, Basque Country</wicri:regionArea>
<wicri:noRegion>Basque Country</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Hadj Taieb, Mohamed Ali" sort="Hadj Taieb, Mohamed Ali" uniqKey="Hadj Taieb M" first="Mohamed Ali" last="Hadj Taieb">Mohamed Ali Hadj Taieb</name>
<affiliation wicri:level="1"><nlm:affiliation>Faculty of Sciences of Sfax, Tunisia.</nlm:affiliation>
<country xml:lang="fr">Tunisie</country>
<wicri:regionArea>Faculty of Sciences of Sfax</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Garcia Serrano, Ana" sort="Garcia Serrano, Ana" uniqKey="Garcia Serrano A" first="Ana" last="García-Serrano">Ana García-Serrano</name>
<affiliation wicri:level="3"><nlm:affiliation>NLP & IR Research Group, ETSI de Informática (UNED), Universidad Nacional de Educación a Distancia, Juan Del Rosal 16, 28040, Madrid, Spain.</nlm:affiliation>
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>NLP & IR Research Group, ETSI de Informática (UNED), Universidad Nacional de Educación a Distancia, Juan Del Rosal 16, 28040, Madrid</wicri:regionArea>
<placeName><settlement type="city">Madrid</settlement>
<region nuts="2" type="region">Communauté de Madrid</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Aouicha, Mohamed Ben" sort="Aouicha, Mohamed Ben" uniqKey="Aouicha M" first="Mohamed Ben" last="Aouicha">Mohamed Ben Aouicha</name>
<affiliation wicri:level="1"><nlm:affiliation>Faculty of Sciences of Sfax, Tunisia.</nlm:affiliation>
<country xml:lang="fr">Tunisie</country>
<wicri:regionArea>Faculty of Sciences of Sfax</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Agirre, Eneko" sort="Agirre, Eneko" uniqKey="Agirre E" first="Eneko" last="Agirre">Eneko Agirre</name>
<affiliation wicri:level="1"><nlm:affiliation>IXA NLP Group, Faculty of Informatics, UPV/EHU∖∖ Manuel Lardizabal 1, 20018, Donostia, Basque Country, Spain.</nlm:affiliation>
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>IXA NLP Group, Faculty of Informatics, UPV/EHU∖∖ Manuel Lardizabal 1, 20018, Donostia, Basque Country</wicri:regionArea>
<wicri:noRegion>Basque Country</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2019">2019</date>
<idno type="RBID">pubmed:31516953</idno>
<idno type="pmid">31516953</idno>
<idno type="doi">10.1016/j.dib.2019.104432</idno>
<idno type="pmc">PMC6736772</idno>
<idno type="wicri:Area/Main/Corpus">000216</idno>
<idno type="wicri:explorRef" wicri:stream="Main" wicri:step="Corpus" wicri:corpus="PubMed">000216</idno>
<idno type="wicri:Area/Main/Curation">000216</idno>
<idno type="wicri:explorRef" wicri:stream="Main" wicri:step="Curation">000216</idno>
<idno type="wicri:Area/Main/Exploration">000216</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity.</title>
<author><name sortKey="Lastra Diaz, Juan J" sort="Lastra Diaz, Juan J" uniqKey="Lastra Diaz J" first="Juan J" last="Lastra-Díaz">Juan J. Lastra-Díaz</name>
<affiliation wicri:level="3"><nlm:affiliation>NLP & IR Research Group, ETSI de Informática (UNED), Universidad Nacional de Educación a Distancia, Juan Del Rosal 16, 28040, Madrid, Spain.</nlm:affiliation>
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>NLP & IR Research Group, ETSI de Informática (UNED), Universidad Nacional de Educación a Distancia, Juan Del Rosal 16, 28040, Madrid</wicri:regionArea>
<placeName><settlement type="city">Madrid</settlement>
<region nuts="2" type="region">Communauté de Madrid</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Goikoetxea, Josu" sort="Goikoetxea, Josu" uniqKey="Goikoetxea J" first="Josu" last="Goikoetxea">Josu Goikoetxea</name>
<affiliation wicri:level="1"><nlm:affiliation>IXA NLP Group, Faculty of Informatics, UPV/EHU∖∖ Manuel Lardizabal 1, 20018, Donostia, Basque Country, Spain.</nlm:affiliation>
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>IXA NLP Group, Faculty of Informatics, UPV/EHU∖∖ Manuel Lardizabal 1, 20018, Donostia, Basque Country</wicri:regionArea>
<wicri:noRegion>Basque Country</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Hadj Taieb, Mohamed Ali" sort="Hadj Taieb, Mohamed Ali" uniqKey="Hadj Taieb M" first="Mohamed Ali" last="Hadj Taieb">Mohamed Ali Hadj Taieb</name>
<affiliation wicri:level="1"><nlm:affiliation>Faculty of Sciences of Sfax, Tunisia.</nlm:affiliation>
<country xml:lang="fr">Tunisie</country>
<wicri:regionArea>Faculty of Sciences of Sfax</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Garcia Serrano, Ana" sort="Garcia Serrano, Ana" uniqKey="Garcia Serrano A" first="Ana" last="García-Serrano">Ana García-Serrano</name>
<affiliation wicri:level="3"><nlm:affiliation>NLP & IR Research Group, ETSI de Informática (UNED), Universidad Nacional de Educación a Distancia, Juan Del Rosal 16, 28040, Madrid, Spain.</nlm:affiliation>
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>NLP & IR Research Group, ETSI de Informática (UNED), Universidad Nacional de Educación a Distancia, Juan Del Rosal 16, 28040, Madrid</wicri:regionArea>
<placeName><settlement type="city">Madrid</settlement>
<region nuts="2" type="region">Communauté de Madrid</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Aouicha, Mohamed Ben" sort="Aouicha, Mohamed Ben" uniqKey="Aouicha M" first="Mohamed Ben" last="Aouicha">Mohamed Ben Aouicha</name>
<affiliation wicri:level="1"><nlm:affiliation>Faculty of Sciences of Sfax, Tunisia.</nlm:affiliation>
<country xml:lang="fr">Tunisie</country>
<wicri:regionArea>Faculty of Sciences of Sfax</wicri:regionArea>
</affiliation>
</author>
<author><name sortKey="Agirre, Eneko" sort="Agirre, Eneko" uniqKey="Agirre E" first="Eneko" last="Agirre">Eneko Agirre</name>
<affiliation wicri:level="1"><nlm:affiliation>IXA NLP Group, Faculty of Informatics, UPV/EHU∖∖ Manuel Lardizabal 1, 20018, Donostia, Basque Country, Spain.</nlm:affiliation>
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>IXA NLP Group, Faculty of Informatics, UPV/EHU∖∖ Manuel Lardizabal 1, 20018, Donostia, Basque Country</wicri:regionArea>
<wicri:noRegion>Basque Country</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j">Data in brief</title>
<idno type="eISSN">2352-3409</idno>
<imprint><date when="2019" type="published">2019</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This data article introduces a reproducibility dataset with the aim of allowing the exact replication of all experiments, results and data tables introduced in our companion paper (Lastra-Díaz et al., 2019), which introduces the largest experimental survey on ontology-based semantic similarity methods and Word Embeddings (WE) for word similarity reported in the literature. The implementation of all our experiments, as well as the gathering of all raw data derived from them, was based on the software implementation and evaluation of all methods in HESML library (Lastra-Díaz et al., 2017), and their subsequent recording with Reprozip (Chirigati et al., 2016). Raw data is made up by a collection of data files gathering the raw word-similarity values returned by each method for each word pair evaluated in any benchmark. Raw data files were processed by running a R-language script with the aim of computing all evaluation metrics reported in (Lastra-Díaz et al., 2019), such as Pearson and Spearman correlation, harmonic score and statistical significance p-values, as well as to generate automatically all data tables shown in our companion paper. Our dataset provides all input data files, resources and complementary software tools to reproduce from scratch all our experimental data, statistical analysis and reported data. Finally, our reproducibility dataset provides a self-contained experimentation platform which allows to run new word similarity benchmarks by setting up new experiments including other unconsidered methods or word similarity benchmarks.</div>
</front>
</TEI>
<pubmed><MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM"><PMID Version="1">31516953</PMID>
<DateRevised><Year>2020</Year>
<Month>10</Month>
<Day>01</Day>
</DateRevised>
<Article PubModel="Electronic-eCollection"><Journal><ISSN IssnType="Electronic">2352-3409</ISSN>
<JournalIssue CitedMedium="Internet"><Volume>26</Volume>
<PubDate><Year>2019</Year>
<Month>Oct</Month>
</PubDate>
</JournalIssue>
<Title>Data in brief</Title>
<ISOAbbreviation>Data Brief</ISOAbbreviation>
</Journal>
<ArticleTitle>Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity.</ArticleTitle>
<Pagination><MedlinePgn>104432</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1016/j.dib.2019.104432</ELocationID>
<Abstract><AbstractText>This data article introduces a reproducibility dataset with the aim of allowing the exact replication of all experiments, results and data tables introduced in our companion paper (Lastra-Díaz et al., 2019), which introduces the largest experimental survey on ontology-based semantic similarity methods and Word Embeddings (WE) for word similarity reported in the literature. The implementation of all our experiments, as well as the gathering of all raw data derived from them, was based on the software implementation and evaluation of all methods in HESML library (Lastra-Díaz et al., 2017), and their subsequent recording with Reprozip (Chirigati et al., 2016). Raw data is made up by a collection of data files gathering the raw word-similarity values returned by each method for each word pair evaluated in any benchmark. Raw data files were processed by running a R-language script with the aim of computing all evaluation metrics reported in (Lastra-Díaz et al., 2019), such as Pearson and Spearman correlation, harmonic score and statistical significance p-values, as well as to generate automatically all data tables shown in our companion paper. Our dataset provides all input data files, resources and complementary software tools to reproduce from scratch all our experimental data, statistical analysis and reported data. Finally, our reproducibility dataset provides a self-contained experimentation platform which allows to run new word similarity benchmarks by setting up new experiments including other unconsidered methods or word similarity benchmarks.</AbstractText>
</Abstract>
<AuthorList CompleteYN="Y"><Author ValidYN="Y"><LastName>Lastra-Díaz</LastName>
<ForeName>Juan J</ForeName>
<Initials>JJ</Initials>
<AffiliationInfo><Affiliation>NLP & IR Research Group, ETSI de Informática (UNED), Universidad Nacional de Educación a Distancia, Juan Del Rosal 16, 28040, Madrid, Spain.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Goikoetxea</LastName>
<ForeName>Josu</ForeName>
<Initials>J</Initials>
<AffiliationInfo><Affiliation>IXA NLP Group, Faculty of Informatics, UPV/EHU∖∖ Manuel Lardizabal 1, 20018, Donostia, Basque Country, Spain.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Hadj Taieb</LastName>
<ForeName>Mohamed Ali</ForeName>
<Initials>MA</Initials>
<AffiliationInfo><Affiliation>Faculty of Sciences of Sfax, Tunisia.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>García-Serrano</LastName>
<ForeName>Ana</ForeName>
<Initials>A</Initials>
<AffiliationInfo><Affiliation>NLP & IR Research Group, ETSI de Informática (UNED), Universidad Nacional de Educación a Distancia, Juan Del Rosal 16, 28040, Madrid, Spain.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Aouicha</LastName>
<ForeName>Mohamed Ben</ForeName>
<Initials>MB</Initials>
<AffiliationInfo><Affiliation>Faculty of Sciences of Sfax, Tunisia.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y"><LastName>Agirre</LastName>
<ForeName>Eneko</ForeName>
<Initials>E</Initials>
<AffiliationInfo><Affiliation>IXA NLP Group, Faculty of Informatics, UPV/EHU∖∖ Manuel Lardizabal 1, 20018, Donostia, Basque Country, Spain.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList><PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic"><Year>2019</Year>
<Month>08</Month>
<Day>26</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo><Country>Netherlands</Country>
<MedlineTA>Data Brief</MedlineTA>
<NlmUniqueID>101654995</NlmUniqueID>
<ISSNLinking>2352-3409</ISSNLinking>
</MedlineJournalInfo>
<KeywordList Owner="NOTNLM"><Keyword MajorTopicYN="N">Experimental survey</Keyword>
<Keyword MajorTopicYN="N">HESML</Keyword>
<Keyword MajorTopicYN="N">Information content models</Keyword>
<Keyword MajorTopicYN="N">Ontology-based semantic similarity measures</Keyword>
<Keyword MajorTopicYN="N">Reprozip</Keyword>
<Keyword MajorTopicYN="N">Word embedding models</Keyword>
<Keyword MajorTopicYN="N">WordNet</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData><History><PubMedPubDate PubStatus="received"><Year>2019</Year>
<Month>07</Month>
<Day>28</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="revised"><Year>2019</Year>
<Month>08</Month>
<Day>11</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted"><Year>2019</Year>
<Month>08</Month>
<Day>16</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez"><Year>2019</Year>
<Month>9</Month>
<Day>14</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed"><Year>2019</Year>
<Month>9</Month>
<Day>14</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline"><Year>2019</Year>
<Month>9</Month>
<Day>14</Day>
<Hour>6</Hour>
<Minute>1</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>epublish</PublicationStatus>
<ArticleIdList><ArticleId IdType="pubmed">31516953</ArticleId>
<ArticleId IdType="doi">10.1016/j.dib.2019.104432</ArticleId>
<ArticleId IdType="pii">S2352-3409(19)30787-5</ArticleId>
<ArticleId IdType="pii">104432</ArticleId>
<ArticleId IdType="pmc">PMC6736772</ArticleId>
</ArticleIdList>
</PubmedData>
</pubmed>
<affiliations><list><country><li>Espagne</li>
<li>Tunisie</li>
</country>
<region><li>Communauté de Madrid</li>
</region>
<settlement><li>Madrid</li>
</settlement>
</list>
<tree><country name="Espagne"><region name="Communauté de Madrid"><name sortKey="Lastra Diaz, Juan J" sort="Lastra Diaz, Juan J" uniqKey="Lastra Diaz J" first="Juan J" last="Lastra-Díaz">Juan J. Lastra-Díaz</name>
</region>
<name sortKey="Agirre, Eneko" sort="Agirre, Eneko" uniqKey="Agirre E" first="Eneko" last="Agirre">Eneko Agirre</name>
<name sortKey="Garcia Serrano, Ana" sort="Garcia Serrano, Ana" uniqKey="Garcia Serrano A" first="Ana" last="García-Serrano">Ana García-Serrano</name>
<name sortKey="Goikoetxea, Josu" sort="Goikoetxea, Josu" uniqKey="Goikoetxea J" first="Josu" last="Goikoetxea">Josu Goikoetxea</name>
</country>
<country name="Tunisie"><noRegion><name sortKey="Hadj Taieb, Mohamed Ali" sort="Hadj Taieb, Mohamed Ali" uniqKey="Hadj Taieb M" first="Mohamed Ali" last="Hadj Taieb">Mohamed Ali Hadj Taieb</name>
</noRegion>
<name sortKey="Aouicha, Mohamed Ben" sort="Aouicha, Mohamed Ben" uniqKey="Aouicha M" first="Mohamed Ben" last="Aouicha">Mohamed Ben Aouicha</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MaghrebDataLibMedV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000224 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000224 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MaghrebDataLibMedV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:31516953
   |texte=   Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:31516953" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MaghrebDataLibMedV1

This area was generated with Dilib version V0.6.38.
Data generation: Thu Jun 17 16:21:50 2021. Site generation: Thu Jun 17 21:51:18 2021

	Serveur sur les données et bibliothèques médicales au Maghreb
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur sur les données et bibliothèques médicales au Maghreb

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity.

Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity.

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri

Pour générer des pages wiki